Skip to content

feat(ai): hybrid (semantic) catalog search — semantic=true (AI-057)#361

Open
mrviduus wants to merge 2 commits into
mainfrom
ai-057-hybrid-search
Open

feat(ai): hybrid (semantic) catalog search — semantic=true (AI-057)#361
mrviduus wants to merge 2 commits into
mainfrom
ai-057-hybrid-search

Conversation

@mrviduus

Copy link
Copy Markdown
Owner

AI-057 — hybrid (semantic) catalog search (Phase 9)

GET /search?q=&semantic=true blends keyword (FTS) + vector (query-embedding vs editions.embedding cosine) edition rankings via RRF — a semantically-related-but-keyword-absent book surfaces. Same PaginatedResult<SearchResultDto> shape (frontend-transparent).

  • HybridCatalogSearch (Application): wide FTS pool + EmbedAsync + editions-cosine rank (AI-055 visibility, param vector → HNSW) + RrfFusion.Fuse on edition_id + paginate the fused order; vector-only hits get card fields + empty highlights.
  • semantic absent/false → pure-FTS path byte-for-byte unchanged (no embed/cost); search-semantic rate limit (20/min) only when semantic=true.
  • No visibility leak (vector side + vector-only fetch both apply catalog filters — integration-asserted, all 4 exclusion classes).
  • Graceful FTS fallback (QA P2): embed/vector failure → FTS-only, never hard-fails; cancellation propagates.

654 unit + integration (pgvector, gated). StudyBuddy set-equality green; docker-compose clean. status=1 Published. Frontend toggle = later.

🤖 Generated with Claude Code

Phase 9. GET /search?q=&semantic=true blends keyword (FTS) + vector
(query-embedding vs editions.embedding cosine) edition rankings via RRF,
same PaginatedResult<SearchResultDto> shape (frontend-transparent). A
semantically-related-but-keyword-absent book surfaces — the payoff.

- Orchestrator HybridCatalogSearch (Application, not the FTS provider which
  has no AI deps): wide FTS pool (offset 0) + IEmbeddingService.EmbedAsync
  + editions-cosine rank (AI-055 visibility: site, status=1, embedding NOT
  NULL, lang, EXISTS(chapters); cosine <=> via HNSW, param vector) +
  RrfFusion.Fuse on edition_id + paginate the FUSED order. Vector-only hits
  get title/author/cover + first-chapter fallback + empty highlights.
- semantic absent/false → today's pure-FTS path byte-for-byte unchanged,
  no embed, no cost; search-semantic rate limit (20/min) applies ONLY when
  semantic=true (pure-FTS unthrottled).
- Graceful FTS fallback (QA P2): embed/vector-rank failure → log + verbatim
  searchProvider.SearchAsync (semantic search never hard-fails catalog
  search); OperationCanceledException propagates.

654 unit (fusion granularity, toggle-off passthrough, embed-guard, empty-
vector degradation, embed-failure fallback, cancellation) + integration
(pgvector, gated): keyword-absent semantic hit surfaces, draft/hidden/
other-site/other-lang never appear, pure-FTS control no drift. StudyBuddy
set-equality green; docker-compose clean. Frontend toggle UI = later.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mrviduus mrviduus force-pushed the ai-057-hybrid-search branch 2 times, most recently from 30d7a15 to 53528c8 Compare June 20, 2026 20:16
'empty search shows empty state' flaked (3/3) on the AI-057 branch though
AI-057's non-semantic path is byte-identical to main (searchProvider.SearchAsync
verbatim) and it touches no frontend. Root cause is the test: networkidle +
body.toContainText raced SSG->CSR hydration so body resolved empty. Now wait for
the actual /search XHR then the .empty-state element (mirrors the robust sibling
test). Unblocks AI-057, stabilizes the suite.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant